29 research outputs found

    Multi-level audio classification architecture

    Get PDF
    A multi-level classification architecture for solving binary discrimination problem is proposed in this paper. The main idea of proposed solution is derived from the fact that solving one binary discrimination problem multiple times can reduce the overall miss-classification error. We aimed our effort towards building the classification architecture employing the combination of multiple binary SVM (Support Vector Machine) classifiers for solving two-class discrimination problem. Therefore, we developed a binary discrimination architecture employing the SVM classifier (BDASVM) with intention to use it for classification of broadcast news (BN) audio data. The fundamental element of BDASVM is the binary decision (BD) algorithm that performs discrimination between each pair of acoustic classes utilizing decision function modeled by separating hyperplane. The overall classification accuracy is conditioned by finding the optimal parameters for discrimination function resulting in higher computational complexity. The final form of proposed BDASVM is created by combining four BDSVM discriminators supplemented by decision table. Experimental results show that the proposed classification architecture can decrease the overall classification error in comparison with binary decision trees SVM (BDTSVM) architecture

    Comparison of diarization tools for building speaker database

    Get PDF
    This paper compares open source diarization toolkits (LIUM, DiarTK, ALIZE-Lia_Ral), which were designed for extraction of speaker identity from audio records without any prior information about the analysed data. The comparative study of used diarization tools was performed for three different types of analysed data (broadcast news - BN and TV shows). Corresponding values of achieved DER measure are presented here. The automatic speaker diarization system developed by LIUM was able to identified speech segments belonging to speakers at very good level. Its segmentation outputs can be used to build a speaker database

    Classification of Broadcast News Audio Data Employing Binary Decision Architecture

    Get PDF
    A novel binary decision architecture (BDA) for broadcast news audio classification task is presented in this paper. The idea of developing such architecture came from the fact that the appropriate combination of multiple binary classifiers for two-class discrimination problem can reduce a miss-classification error without rapid increase in computational complexity. The core element of classification architecture is represented by a binary decision (BD) algorithm that performs discrimination between each pair of acoustic classes, utilizing two types of decision functions. The first one is represented by a simple rule-based approach in which the final decision is made according to the value of selected discrimination parameter. The main advantage of this solution is relatively low processing time needed for classification of all acoustic classes. The cost for that is low classification accuracy. The second one employs support vector machine (SVM) classifier. In this case, the overall classification accuracy is conditioned by finding the optimal parameters for decision function resulting in higher computational complexity and better classification performance. The final form of proposed BDA is created by combining four BD discriminators supplemented by decision table. The effectiveness of proposed BDA, utilizing rule-based approach and the SVM classifier, is compared with two most popular strategies for multiclass classification, namely the binary decision trees (BDT) and the One-Against-One SVM (OAOSVM). Experimental results show that the proposed classification architecture can decrease the overall classification error in comparison with the BDT architecture. On the contrary, an optimization technique for selecting the optimal set of training data is needed in order to overcome the OAOSVM

    Dual shots detection

    Get PDF
    The identification of a special kind of acoustic events such as dual gunshots and single gunshots in the traffic background is described in this work. The recognition of dangerous sounds may help to prevent the abnormal or criminal activities that happened near to the public transport stations. Therefore in this paper the methodology of dual shots detection in a noisy background was developed and evaluated. For this purpose, we investigated various feature extraction methods and combinations of different feature sets. These approaches were evaluated by the widely used classification technique based on the Hidden Markov Models

    Improving the Slovak LVCSR performance by cluster-sensitive acoustic model retraining

    Get PDF
    In this paper, we present a cluster-dependent adaptation approach for HMM-based acoustic models. The proposed approach employs clustering techniques to group the original training utterances into clusters with predefined number. The clustered speech data are intended to adapt an initially pre-trained acoustic model to the specific cluster by reestimation based on the standard Baum-Welch procedure. The resulting model, adapted to the homogeneous data may markedly improve the baseline recognition rate, whereas the model complexity may be reduced. In the recognition step, the test samples are scored by each adapted model and the most accurate one is chosen. The proposed approach is thoroughly evaluated in Slovak triphone-based large vocabulary continuous speech recognition (LVCSR) system. The results prove that the cluster-sensitive retraining leads to significant improvements over the baseline reference system trained according to the conventional training procedure

    Morphological analysis of the Slovak language

    Get PDF
    This paper proposes a new statistic-based method of segmenting words by identification of a suffix. Ability to identify suffix can improve morphological analysis by allowing the classifier to assign tags to words previously unseen in the training corpus. Identified suffix of the word can be used to improve the accuracy of the part-of-speech tagging or other natural language processing task

    Development of the Slovak HMM-Based TTS System and Evaluation of Voices in Respect to the Used Vocoding Techniques

    Get PDF
    This paper describes the development of a Slovak text-to-speech system which applies a technique wherein speech is directly synthesized from hidden Markov models. Statistical models for Slovak speech units are trained by using the newly created female and male phonetically balanced speech corpora. In addition, contextual informations about phonemes, syllables, words, phrases, and utterances were determined, as well as questions for decision tree-based context clustering algorithms. In this paper, recent statistical parametric speech synthesis methods including the conventional, STRAIGHT and AHOcoder speech synthesis systems are implemented and evaluated. Objective evaluation methods (mel-cepstral distortion and fundamental frequency comparison) and subjective ones (mean opinion score and semantically unpredictable sentences test) are carried out to compare these systems with each other and evaluation of their overall quality. The result of this work is a set of text to speech systems for Slovak language which are characterized by very good intelligibility and quite good naturalness of utterances at the output of these systems. In the subjective tests of intelligibility the STRAIGHT based female voice and AHOcoder based male voice reached the highest scores

    Categorization of unorganized text corpora for better domain-specific language modeling

    Get PDF
    This paper describes the process of categorization of unorganized text data gathered from the Internet to the in-domain and out-of-domain data for better domain-specific language modeling and speech recognition. An algorithm for text categorization and topic detection based on the most frequent key phrases is presented. In this scheme, each document entered into the process of text categorization is represented by a vector space model with term weighting based on computing the term frequency and inverse document frequency. Text documents are then classified to the in-domain and out-of-domain data automatically with predefined threshold using one of the selected distance/similarity measures comparing to the list of key phrases. The experimental results of the language modeling and adaptation to the judicial domain show significant improvement in the model perplexity about 19 % and decreasing of the word error rate of the Slovak transcription and dictation system about 5,54 %, relatively
    corecore